A Mutually Beneficial Integration of Data Mining and Information Extraction

نویسندگان

  • Un Yong Nahm
  • Raymond J. Mooney
چکیده

Text mining concerns applying data mining techniques to unstructured text. Information extraction (IE) is a form of shallow text understanding that locates specific pieces of data in natural language documents, transforming unstructured text into a structured database. This paper describes a system called DISCOTEX, that combines IE and data mining methodologies to perform text mining as well as improve the performance of the underlying extraction system. Rules mined from a database extracted from a corpus of texts are used to predict additional information to extract from future documents, thereby improving the recall of IE. Encouraging results are presented on applying these techniques to a corpus of computer job announcement postings from an Internet newsgroup.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Geometric View of Similarity Measures in Data Mining

The main objective of data mining is to acquire information from a set of data for prospect applications using a measure. The concerning issue is that one often has to deal with large scale data. Several dimensionality reduction techniques like various feature extraction methods have been developed to resolve the issue. However, the geometric view of the applied measure, as an additional consid...

متن کامل

ICS: An Interactive Classification System

Interactive data mining focuses on efficient and effective humancomputer interactions for data analysis purposes. An interactive system is an integration of a human user and a computer machine. ICS, an interactive classification system, is implemented to demonstrate the power of interactive data mining. The interaction is mutually beneficial to users and machines. This article describes the arc...

متن کامل

Validation of Mixed-structured Data Using Pattern Mining and Information Extraction

For large-scale data mining utilizing data from ubiquitous and mixed-structured data sources, the appropriate extraction and integration into a comprehensive data-warehouse is of prime importance. Then, appropriate methods for validation and potential refinement are essential. This paper presents an approach applying data mining and information extraction methods for data validation: We apply s...

متن کامل

Using a Data Mining Tool and FP-Growth Algorithm Application for Extraction of the Rules in two Different Dataset (TECHNICAL NOTE)

In this paper, we want to improve association rules in order to be used in recommenders. Recommender systems present a method to create the personalized offers. One of the most important types of recommender systems is the collaborative filtering that deals with data mining in user information and offering them the appropriate item. Among the data mining methods, finding frequent item sets and ...

متن کامل

Joint Bayesian Stochastic Inversion of Well Logs and Seismic Data for Volumetric Uncertainty Analysis

Here in, an application of a new seismic inversion algorithm in one of Iran’s oilfields is described. Stochastic (geostatistical) seismic inversion, as a complementary method to deterministic inversion, is perceived as contribution combination of geostatistics and seismic inversion algorithm. This method integrates information from different data sources with different scales, as prior informat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000